Text-based inference chaining

ABSTRACT

A method, system and computer program product for generating inference graphs over content to answer input inquiries. First, independent factors are produced from the inquiry, and these factors are converted to questions. The questions are then input to a probabilistic question answering system (PQA) that discovers relations which are used to iteratively expand an inference graph starting from the factors and ending with possible answers. A probabilistic reasoning system is used to infer the confidence in each answer by, for example, propagating confidences across relations and nodes in the inference graph as it is expanded. The inference graph generator system can be used to simultaneously bi-directionally generate forward and backward inference graphs that uses a depth controller component to limit the generation of both paths if they do not meet. Otherwise, a joiner process forces the discovery of relations that may join the answers to factors in the inquiry.

BACKGROUND

The present disclosure generally relates to information retrieval, andmore specifically, automated systems that provide answers to questionsor inquiries.

Generally, there are many types of information retrieval and questionanswering systems, including expert or knowledge-based (KB) systems,document or text search/retrieval systems and question answering (QA)systems.

Expert or knowledge-based systems take in a formal query or map naturallanguage to a formal query and then produce a precise answer and a proofjustifying the answer based on a set of formal rules encoded by humans.

Document or text search systems are not designed to deliver and justifyprecise answers. Rather they produce snippets or documents that containkey words or search terms entered by a user, for example, via acomputing system interface, e.g., a web-browser. There is no expectationthat the results provide a solution or answer. Text search systems arebased on the prevailing and implicit assumption that all valid resultsto a query are documents or snippets that contain the keywords from thequery.

QA systems provide a type of information retrieval. Given a collectionof documents (such as the World Wide Web or a local collection), a QAsystem may retrieve answers to questions posed in natural language. QAis regarded as requiring more complex natural language processing (NLP)techniques than other types of information retrieval, such as documentretrieval, and QA is sometimes regarded as the next step beyond searchengines.

Traditional QA systems deliver precise answers, unlike document searchsystems, but do not produce paths of justifications like expert systems.Their justifications are “one-step” meaning that they provide an answerby finding one or more passages that alone suggest that proposed orcandidate answer is correct.

It would be highly desirable to provide a system and method that cananswer complex inquiries that search systems, classic expert/KB systemsand simpler QA systems can not handle.

SUMMARY

Embodiments of the invention provide a method, system and computerprogram product that can answer complex inquiries that search systems,classic expert/KB systems and simpler QA systems can not handle.

In one aspect, there is provided a system, method and computer programproduct for inferring answers to inquiries. The method comprises:receiving an input inquiry; decomposing the input inquiry to obtain oneor more factors, the factors forming initial nodes of an inferencegraph; iteratively constructing the inference graph over content one ormore from content sources, wherein at each iteration, a processingdevice performs discovering solutions to the input inquiry by connectingfactors to solutions via one or more relations, each relation in aninference graph being justified by one or more passages from thecontent, the inference graph connecting factors to the solutions overone or more paths having one or more edges representing the relations;and, providing a solution to the inquiry from the inference graph,wherein a programmed processor device is configured to perform one ormore the receiving, decomposing and the iteratively constructing theinference graph to provide the solution.

In a further aspect, a method of inferring answers to inquiriescomprises: receiving an input inquiry; decomposing the input inquiry toobtain one or more factors; decomposing the input inquiry into queryterms, and using the query terms to obtain one or more candidate answersfor the input inquiry; iteratively constructing using a programmedprocessor device coupled to a content storage source having content, afirst inference graph using the factors as initial nodes of the firstinference graph, a constructed first inference graph connecting factorsto one or more nodes that lead to an answer for the inquiry over one ormore paths having one or more edges representing the relations;simultaneously iteratively constructing, using the programmed processordevice and the content source, a second inference graph using the one ormore candidate answers as initial nodes of the second inference graph,the second inference graph connecting candidate answers to one or morenodes that connect to the one or more factors of the inquiry over one ormore paths having one or more edges representing relations; and,generating, during the simultaneous iterative constructing, a finalinference graph by joining the first inference graph to the secondinference graph, the final inference graph having a joined noderepresenting a solution to the input inquiry.

In a further aspect, a system for inferring answers to inquiriescomprises: one or more content sources providing content; a processordevice for coupling to the content sources and configured to: receive aninput inquiry; decompose the input inquiry to obtain one or morefactors, the factors forming initial nodes of an inference graph;iteratively construct the inference graph over content one or more fromcontent sources, wherein at each iteration, the processing devicediscovers solutions to the input inquiry by connecting factors tosolutions via one or more relations, each relation in an inference graphbeing justified by one or more passages from the content, the inferencegraph connecting factors to the solutions over one or more paths havingone or more edges representing the relations; and, provide a solution tothe inquiry from the constructed inference graph.

In a further aspect, there is provided a system for inferring answers toinquiries comprising: one or more content sources providing content; aprogrammed processor device for coupling to the content sources andconfigured to: receive an input inquiry; decompose the input inquiry toobtain one or more factors; and, decompose the input inquiry into queryterms, and using the query terms to obtain one or more candidate answersfor the input inquiry; iteratively construct a first inference graphusing the factors as initial nodes of the first inference graph, aconstructed first inference graph connecting factors to one or morenodes that lead to an answer for the inquiry over one or more pathshaving one or more edges representing the relations; simultaneouslyiteratively construct a second inference graph using the one or morecandidate answers as initial nodes of the second inference graph, thesecond inference graph connecting candidate answers to one or more nodesthat connect to the one or more factors of the inquiry over one or morepaths having one or more edges representing relations; and, generate,during the simultaneous iterative constructing, a final inference graphby joining the first inference graph to the second inference graph, thefinal inference graph having a joined node representing a solution tothe input inquiry.

A computer program product is provided for performing operations. Thecomputer program product includes a storage medium readable by aprocessing circuit and storing instructions run by the processingcircuit for running methods. The methods are the same as listed above.

BRIEF DESCRIPTION OF THE DRAWINGS

The objects, features and advantages of the invention are understoodwithin the context of the Detailed Description, as set forth below. TheDetailed Description is understood within the context of theaccompanying drawings, which form a material part of this disclosure,wherein:

FIG. 1A illustrates conceptually an inference graph, generated and usedby an embodiment of the inference chaining system and method includingan interconnection of nodes by arcs or graph edges;

FIG. 1B shows an illustrative example of a generated inference graph inwhich a relation is represented by the edge between nodes;

FIG. 2 illustrates a high level schematic of a system and methodemploying text-based inference chaining system and method;

FIGS. 3A-3B illustrate a text-based inference chaining methodologyperformed by the text-based inference chaining system of the embodimentsdescribed herein;

FIG. 4 illustrates a high level schematic of the text-based inferencechaining system and method employing one or more computing devices thatperform an iterative process;

FIG. 5 illustrates a further embodiment of the text-based inferencechaining system and method 100′ including additional relation injectioncomponents;

FIG. 6 illustrates a further embodiment of the text-based inferencechaining system and method 100″ including a node filtering component;

FIG. 7 illustrates an example of a multi-step inference graph generationgiven an input question;

FIG. 8 illustrates an embodiment of the factor analysis component of thetext-based inference chaining system and method;

FIG. 9 illustrates a further detailed embodiment of the QuestionGeneration component implementing Relation Injection component togenerate natural language questions from an input Inquiry;

FIG. 10 shows an implementation of a reasoner component receiving asinput an inference graph with some events;

FIGS. 10A-10F shows an example implementation of a reasoner componentprocesses for a medical domain inquiry example;

FIG. 11 shows a depth controller processes to analyze a current updatedinference graph at each iteration, and decide if the graph should beconsidered final and the process should halt;

FIG. 12 is the text-based inference chaining system and method employinga bi-directional graph generation inquiry solution strategy;

FIG. 13 illustrates a factor-directed or forward-directed inferencegraph generation iterative process which functions identically asprogrammed text-based inference chaining system and method;

FIG. 14 illustrates a hypothesis-directed inference graph generationiterative process implementing a candidate answer generator to produceinitial nodes in a backward inference graph;

FIG. 15 illustrates the implementation of an inference graph joinerprocess to merge nodes and/or join respective generated forward andbackward-directed graphs;

FIG. 16 depicts an example node joiner process for combining thebi-directionally generated inference graphs by looking for relationsbetween end-point nodes of the forward-directed graph and a node in thebackward-directed graph;

FIGS. 17A-17B illustrate one example implementation of an InferenceGraph generator according to the embodiments described herein;

FIG. 18 shows a further embodiment of the inference chaining system andmethod including a parallel, simultaneous implementation of PQA Systems;

FIG. 19 shows a system diagram depicting a high-level logicalarchitecture and methodology of an embodiment of each PQA system of FIG.18.

FIG. 20 illustrates an exemplary hardware configuration of a computingsystem 401 in which the present system and method may be employed.

DETAILED DESCRIPTION

The present disclosure is directed to an automated reasoning system and,particularly an inference graph generator system and methodology forautomated answering of complex inquiries that is fundamentally differentfrom all prior expert systems, knowledge-based systems, or automatedreasoning systems.

In one aspect, inference graph generator system and methodology mayfunction entirely over unstructured content (e.g. text), and, unlikeprior systems, does not require the manual encoding of domain knowledgein the form of formal rules (if-then), axioms or procedures of any kind.Rather the system and methods discover paths from the inquiry to answersby discovering, assessing and assembling justifications from as-isnatural language content. Such content is written for humans by humans,never requiring a knowledge engineer to formalize knowledge for thecomputer. Thus this makes the system and method a powerful reasoningsystem.

The inference graph generator system and methodology operates byproviding an explanation of a precise answer based on inference graphthat provides a multi-step path from elements in the query to answers orsolutions.

The inference graph generator system and methodology discovers andjustifies a multi-step path from the query to precise answers byiteratively leveraging a probabilistic text-based QA system componentand a general probabilistic reasoner component. The present system andmethod combines these components to produce justified inference graphsover natural language content.

More particularly, as described in greater detail herein below, in oneembodiment, the inference graph generator system and methodologycombines probabilistic QA to discover answers and justifications withBayesian-type inference to propagate confidence to build inferencesgraphs that justify multi-step paths from factors to answers.

As will be referred to herein, the following definitions are provided:

A Natural Language Inquiry is a statement or question in unrestrictednatural language (e.g. English) that describes a problem, case orscenario in search of an answer or solution. One example is a simplequestion in search of a simple answer like “This man sailed across theAtlantic to India and discovered America.” or “Who sailed across theAtlantic . . . . ?” A further example includes a complex description ofproblems like a patient's history where a diagnosis, treatment or otherresult is sought after. For example: A 40-year-old female has pain onand off after eating fatty food. She has pain in the epigastric regionand sometimes on the right side of her abdomen. After assessing thepatient you order ultrasound of the gallbladder. The ultrasound showspresence of gallstones (choledocholithiasis) but no evidence ofcholecystitis. The patient goes for an elective cholecystectomy.Pathological examination of the gallbladder showed 3 mixed types ofgallstones. The gallbladder mucosa is expected to reveal what change?

A Factor is a logically independent element of an inquiry. One exampleis: “sailed across the Atlantic” “discovered America”, “Patient is 40years old”, “has pain on and off after eating fatty food.

A Relation is a named association between two concepts. For generalexamples: A “indicates” B, A “causes” B, A “treats” B, A “activates” B,A “discovered” B. The concepts are considered the “arguments” or “endpoints” of the relation. Concepts are represented by named entities(Washington) or simply phrases (chain smoking) For domain-specificexamples (in predicate argument form): author of (Bramstoker, Dracula),president of (Obama, US), causes (smoking, lung cancer), treats(aspirin, stroke)).

A Question is a single sentence or phrase in natural language (e.g.,English) or a formal language (e.g., First order logic) that intends toask for the end point(s) of an relation or to ask whether or not arelation between two concepts is true. One example is:

“What does aspirin treat?”/treat(aspirin, X)“Does Aspirin treat Strokes?”/treat(aspirin, strokes)).

A Statement is a natural language expression, a structured relation, ora semi-structured relation. Statements are often used to representfactors and may come from structured or unstructured content. Somenon-limiting examples:

Patient's hemoglobin concentration is 9 g/dL

“low hemoglobin concentration” (Patient)

Has Condition(Patient, anemia)

The patient's mother was diagnosed with breast cancer at the age of 35

An Answer or Solution is an element of text—A word, number, phrase,sentence, passage or document. An answer is thought to be correct orpartially correct with respect to a question or inquiry if a humanconsiders it useful response to the question or inquiry. In the case ofa simple question or relation, the answer is typically the sought-afterend-point of the relation, e.g., “Who discovered America in 1492?” Theanswer is the missing concept, X in the relation “X discovered America”.

Unstructured Content is textual data (e.g., books, journals, web pages,documents etc) and is typically used as a source for answers and as asource for justifications of those answers. Is further used to justifyor evidence the answer to a question or more specifically the truth of arelation (note: it can consider non-text to determine this). Moregenerally, unstructured content may refer to a combination of text,speech and images.

Structured Content is any database or knowledgebase where data isencoded as structured relations. A relational database is typical as isa logical-based knowledgebase.

Content is any combination of unstructured and structured content.

Passage is a sequence of natural language text—one or more phrases,sentences or paragraphs. These are usually made of up 1-5 sentences.

Justifying Passage is a passage thought to explain or justify why ananswer may be correct to a given question.

Confidence is an indication of the degree to which a relation isbelieved true, e.g., a measure of certainty or probability that arelation is true. It is usually represented as a number. It may but doesnot necessarily have to represent a probability.

An Inference Graph is any graph represented by a set of nodes connectedby edges, where the nodes represent statements and the arcs representrelations between statements. Each relation may be associated with aconfidence, and each concept in a relation may be associated with aconfidence. Each edge is associated with a set of passages providing ajustification for why that relation may be true. Each passage justifyingan edge may be associated with a confidence indicating how likely thepassage justifies the relation. An inference graph is used to representrelation paths between factors in an inquiry and possible answer to thatinquiry. An inference graph is multi-step if it contains more than oneedge in a path from a set of factors to an answer. In one embodiment,graph nodes, edges/attributes (confidences), statements and relationsmay be represented in software, as Java objects. Confidences, strengths,and probabilities are attached to them for processing by variouscomputer systems.

A PQA System (Probabilistic QA System) is any system or method thatproduces answers to questions and may associate those answers withconfidences indicating the likelihood the answers are correct, and thatmay associate answers with a passage-based justification that areintended to explain to humans why the answer is likely correct.

FIG. 1A illustrates conceptually an inference graph, generated and usedby the programmed inference chaining system and method of the presentinvention. As shown, inference graph 75 includes an interconnection ofnodes 78 a, 78 b, 78 c by arches or graph edges 80. In the inferencegraph 75 of FIG. 3, nodes 78 a, 78 b are interconnected by an edge 80representing a relation. As shown, each edge or relation 80 includes aset of annotations 85, the set including one or more associatedjustifying passages.

FIG. 1B shows an illustrative example of a generated inference graph 88in which a full statement is implied in all nodes, i.e., “Patient hasHigh Blood Sugar” as implied from node 79 a, and “Patient has Diabetes”as implied from node 79 b, etc. The relation represented by the edgebetween nodes 79 a, 79 b includes a causal relation, i.e., a patienthaving High Blood Sugar may cause a Diabetes issue in node 79 b.

FIG. 2 illustrates a high level schematic of a system and methodemploying text-based inference chaining system and method 100. In oneaspect, text-based inference chaining system and method 100 receives anatural language inquiry 101, retrieves/accesses unstructured content105, and generates an inference graph 110. Particularly, naturallanguage query 101 is an “inquiry” which is more broadly defined than atypical question. The inquiry may be rich series of statements orsentences that are true about a solution or answer. The inquiry may ormay not contain a direct question. Text-based inference chaining systemand method 100 employs the PQA system and a reasoner to discover how onecan get from factors in the original inquiry to possible answers througha path of relations justified by different elements (e.g., passages)from the content 105. An inference graph 110 is generated that isanalogous to multi-step “proof” for traditional expert system. It doesnot require a “rule-base” on the content typically provided intext-based QA systems. The inference graph 110 shows how one can getfrom elements (i.e., factors) in the original inquiry to possibleanswers through a multi-step path of relations each justified bydifferent passages from the content. It is understood that the inferencechaining system and method 100 may include an implementation having adifferent combinations of embodiments as will be described herein withrespect to FIGS. 4, 5, 6.

FIG. 4 illustrates a high level schematic of the text-based inferenceengine 100. The text-based inference chaining system and method 100 is acomputer system employing one or more computing devices that perform aniterative process 99 that generates a final inference graph 110F givenan input inquiry 101, a set(s) of factors, and determined relations. Thetext-based inference chaining system and method 100 first implements afactor analysis component 104 implementing programmed processes toextract factors 106 from the input inquiry 101. Factor analysiscomponent 104 is described in greater detail herein with respect to FIG.8. Programmed processes further generate an initial inference graph 110Iusing factors extracted from the inquiry. This initial inference graph110I may only include factors 106 extracted from the inquiry as initialend-points or nodes. This initial inference graph 110I may be stored asdata in a storage device 107. As will be described in greater detail,iterative processes 99 further discover relations to a new set ofconcepts from the factors 106 that may lead to answers or solutions.

In one aspect, the text-based inference chaining system and method 100provides a system and method that discovers and justifies answers toinquiries by constructing inference graphs over content connectingfactors to answers such that each relation in an inference graph isjustified by one or more passages from the content and where theinference graph may connect factors to answers over a path containingone or more edges (i.e., multi-step inference graph).

At the start of the iteration(s), from the generated initial inferencegraph 110I (or a generated updated inference graph 110U to be extendedin a subsequent iteration), a question generator 112 implements aprogrammed process to first generate questions for the PQA system 115 toanswer. As revised inference graphs are generated at each iteration, newquestions may be generated for PQA system to answer. Particularly, ateach iteration for every new end-point of every new relation in theinference graph, the question generator 112 formulates one or morequestions for the PQA system to answer. Question generator component 112is described in greater detail herein with respect to FIG. 9. Parallelimplemented PQA system 115 receives the formulated questions based onthe prior inference graph, e.g., graph 110P. Based on number ofindependent questions generated, one or more PQA systems may be calledin parallel to discover new relations that answer the questions. The PQAsystem is a type of natural language question-answering system thattakes in a NL question and returns a set of possible answers, aconfidence score for each answer indicating a probability the answer iscorrect, and a set of justifying passages for each answer extracted fromthe body of content that provides evidence for why the answer may becorrect. In one embodiment, IBM DeepQA system may be implemented as thePQA system 115. For a description of IBM DeepQA refer to the descriptionof FIG. 19. Other possible QA systems that may be implemented aspossible embodiments for the PQA system are Javellin (CMU), Ephera (CMUand Open-Source), SMART (MIT), Wolfram Alpha (Wolfram). These eachattempt to produce precise answers to natural language questions butvary in their ability to produce confidence scores and justifyingpassages.

The PQA system 115 performs processes to obtain or discover newrelations 116 that answer the questions from the structured orunstructured content 105. The discovered new relations 116 additionallyinclude confidences and may be stored as data in a storage device 117which may be or include the storage device 107.

As further shown in FIG. 4, in a current iteration, a graph extendercomponent 118 implements programmed processes to receive the stored newrelations and confidences data 117 and extends the previous inferencegraph 110P generated in the immediately prior iteration (e.g., which is110I at the first iteration) based on the new relations and confidencesdata 117. Particularly, graph extender 118 receives the new relationsand confidences 117 and processes the new relations by merging them intothe previous inference graph 110P to result in a new extended inferencegraph 110E shown as output from the graph extender 118 and may be storedas data in a storage device 107.

More particularly, the graph extender 118 takes as input the previousinference graph 110P and a set of new relations 116 discovered by thePQA component and outputs a new inference graph 110E that includes thenew relations. It performs this by merging nodes in the input inferencegraphs with nodes in the new relations and adding them to the graph. Anexample follows:

Input: Inference Graph: A→B→C Input: New Relations: C1→D Output:A→B→(C/C1)→D

where C and C1 where merged (considered the same node). The computedconfidence on C/C1→D is the same confidence produced by the PQA 115system's answer to the question about C that produced C1→D.

In one embodiment, merging nodes may be implemented using some form of“specialization”. For example, if C was “diabetes”, and D was“blindness”, the question generated was “What causes blindness?” and thePQA system produces and relation “diabetes mellitus causes blindness”then the graph extender 118 would merge “diabetes” with “diabetesmellitus”. In this case the embodiment may only merge nodes if they wereidentical or if answer was a connected to a more specific concept. Thus,“diabetes” would merge with “diabetes” or with “diabetes mellitus”. Atthis point, confidences are not re-propagated over the extended graph110E as this is performed by the reasoner component 150.

As shown in FIG. 4, the reasoner component 150 performs programmedprocesses to propagate computed confidences across the relations tooutput an updated (for the current iteration) inference graph 110Uassured of a particular confidence level across the relations. That is,as part of the reasoner process, additional pruning may be performed ascertain relation confidences generated by the PQA system may drop belowa set threshold. It may also merge relations based on similaritymetrics.

The reasoner component 150 is described in greater detail herein withrespect to FIGS. 10 to 10D. In one embodiment, the reasoner component150 receives as input: (1) a set of relations between inference graphnodes, (2) factors, and (3) candidate solutions or answers; and outputsa probability for each node in the inference graph. The reasonercomponent 150 may also optionally output an explanation of why theanswer was correct. An algorithm that has these inputs and outputs canfunction as a reasoned component 150 as further described below withrespect to FIGS. 10 to 10D.

Returning to FIG. 4, a depth controller component 175 performs processesto receive the new updated inference graph 110U, and determine a need tohalt the iteration based on the specified depth or other criteria. Thedepth controller component 175 provides the ability for the inferencechaining system and method to iteratively extend the initial inferencegraph formed from the original factors output by factor Analysis. Thisiterative process will continue to grow the graph unless it is stoppedand the depth controller component 175 provides the ability to halt theiterative process based on a specified depth or other criteria.

The depth controller component 175 is described in greater detail inFIG. 11. At each iteration, the depth controller component 175 performsa method to analyze the current updated inference graph 110U, and decideif the graph should be considered final and the process halted. Thedepth controller may be implemented in a variety of ways. For example,the depth controller may look for a pre-determined depth represented byan integer considered the “Depth Threshold” (DT) value, for example,determining if the current iteration has a DT value of 2. In thisexample, once a graph has extended two steps (relations) from theoriginal factors the iteration will stop and the graph is output asfinal. Another embodiment may consider a “Confidence Threshold” (CT)value, for example, determining there is a node in graph 110U that hasconfidence >=CT. In this example, the depth controller 175 would haltthe iteration and output the graph 110U as a final inference graph 110Fif it contained any node that was associated with a confidence higherthan a given CT value. Any combination of depth and confidence thresholdmay be used in an embodiment of the depth Controller 175. For examplethe system may halt and output the final graph if the depth controllerdetects if the graph has reached a certain depth or if it contains ahigh-confidence node—which ever comes first.

Returning to FIG. 4, if a need to halt the iteration is determined, theupdated inference graph 110U is output as the final inference graph 110Fand stored in a storage device 107. At that point final inference graph110F will include a set of nodes and relations 126 that satisfy thedepth or confidence criterion. Otherwise, the updated inference graph110U is to be extended and is provided as input to question generatorcomponent 112 as a new inference graph of nodes and relations for thenext iteration 99.

FIG. 5 illustrates a further embodiment of the text-based inferencechaining system and method 100′ including additional relation injectioncomponents. In order to make the inference chaining system and methodmore modular and extensible a relation type injection component 130 maybe introduced that separates the logic of forming a natural languagequestion for the PQA system from the relation types used to seed thosequestions. The relation type injection component 130 determines whatrelation type or types 135 should be asked for given a particular node.

Generally, the relation type injection component 130 receives theinitial inference graph 110I, and considers the inquiry and the set ofinitial factors 106 to determine a set of seed relations or relationtypes 135 for use by the question generation component 112. The questiongeneration component 112 is parameterized to allow for the independentprovision of a set of relation types 135. These are then used as seedsfor generating questions for the PQA system 115.

FIG. 6 illustrates a further embodiment of the text-based inferencechaining system and method 100″ including a node filtering component 140for selecting statements and removing them from further consideration inthe generation of the inference graph to improve the efficiency of theprocess. Generally, the node filtering component 140 receives the newrelations and confidences 126 and the previous inference graph 110P datacontent. As many proposed relations with varying confidences are outputby the PQA system 115, the node filtering component 140 implementsprocesses to remove some of the new nodes (i.e., new relationend-points) from consideration based on a variety of pruning algorithms.A simple pruning algorithm may involve providing a confidence thresholdcut-off. In this embodiment, a subset of the new nodes 142 would be usedto extend the inference graph 110P by graph extender component 118.

FIG. 3A illustrates a text-based inference chaining methodology 1000performed by the text-based inference chaining system 100, 100′, 100″ ofFIGS. 4-6. As shown at a first step 1003, there is performed: receiving,at the inference-based chaining system, an input inquiry; decomposingthe input inquiry to obtain one or more factors using NLP text analysis,factor identifying and factor weighing; and, forming initial nodes of aninference graph. Then, at 1005, processes are performed to iterativelyconstruct the inference graph over one or more content sources, whereinat each iteration, the computer-implemented, text-based inferencechaining system discovers answers to the input inquiry by connectingfactors to the answers via one or more relations, each relation in theinference graph being justified by one or more passages from the contentsources. The inference chaining processes connecting factors to saidsolutions in the inference graph over one or more paths having one ormore edges representing the inferred relations. Finally, at 1010,text-based inference chaining method provides a solution having thehighest confidence (as represented by a computed probability value) tothe inquiry from the inference graph.

FIG. 3B is a detailed flow chart illustrating the inference graphgeneration step 1005 of FIG. 3A. As shown in FIG. 3B, at 1050, thetext-based inference chaining methodology 1000 performed by thetext-based inference chaining system 100, 100′, 100″ of FIGS. 4-6 entersan iterative loop, where at a first step 1055 there is performed thegenerating of one or more questions based on one or more current nodesin the graph. In the first iteration, the initial nodes represent thefactors from the original input inquiry. Although not shown, relationsinjection techniques may be performed to determine what relation type ortypes should be asked for given node. Then, at 1060, there is performedsearching in one or more content sources (e.g., the Internet) toidentify one or more relations leading to new solutions. It isunderstood that based on number of independent questions generated, oneor more QA systems may be called in parallel to discover new relationsthat answer the questions. These new answers extend the currentinference graph by representing them as new additional nodes in theinference graph, with each new additional node connected via an edgerepresenting the relation, and each relation having an associatedjustifying passage at an associated probability or confidence level. Itis further understood that the node filtering component may be furtherimplemented to remove some of the new nodes (new relation end-points)from consideration based on a variety of pruning algorithms. Then, at1065, there is performed inferring, from the associated confidencelevels, by the reasoner component, a confidence level at each node ofthe extended inference graph to provide an updated inference graph. Thenat 1070, the inference chaining system determines whether the updatedinference graph meets a criteria for terminating the iteration. This isperformed by the system depth controller element 117 described ingreater detail herein above with respect to FIG. 11. At 1070, if it isdetermined that the termination criteria has not yet been met (both theDT and CT levels have not been met or exceeded), then the processproceeds back to 1055, where the steps of questions generating,searching, confidence inferring and termination criteria determiningsteps with the new additional nodes being current nodes of the inferencegraph are repeated in a next iteration, otherwise, the iterationsterminate

FIG. 7 illustrates an example of a multi-step inference graph 90generation performed by a text-based inference chaining system andmethod as described above. For example, in a medical domain inquiryregarding Parkinson's disease, an initial inference graph 110I maycontain a node “resting tremor” among other nodes. The PQA system forquestion “what causes resting tremor” may returned many possible answerswith associated confidences. For example, Parkinson's Disease (32%),Dystonia (8%), . . . , Multiple system atrophy (3%). Assuming, for thisexample, that “multiple system atrophy” was not an argument to anyrelation found for any of the other factors. Then, its overallconfidence value determined by the reasoner component would be very low.Alternatively, the node filtering component would assign a very lowpriority score to the “Multiple system atrophy” node (relative to morelikely nodes such as Parkinson's Disease) and it could be pruned(removed from further consideration when extending the inference graph).

As shown in FIG. 7, there is input a question 92 in a medical domain:

-   -   A 63-year-old patient is sent to the neurologist with a clinical        picture of resting tremor that began 2 years ago. At first it        was only on the left hand, but now it compromises the whole arm.        At physical exam, the patient has an unexpressive face and        difficulty in walking, and a continuous movement of the tip of        the first digit over the tip of the second digit of the left        hand is seen at rest. What part of his nervous system is most        likely affected?

As shown, the following factors 94 generated by the inference chainingsystem and method may include the following:

63-year-old

Resting tremor began 2 years ago

. . . in the left hand but now the whole arm

Unexpressive face

Difficulty in walking

Continuous movement in the left hand

In a first iteration of the inference chaining method, factors 94obtained from the input query may be found associated with (i.e., relateto) inferred nodes 95, e.g., Parkinson's Disease 95A, or Athetosis 95B.From inferred node 95B, further answers 95C, 95D may be inferred fromadditional relations obtained in a further iteration of the inferencechaining method. For each of the factors found for the medical domainexample, a respective relation that associates the factor to an answeris created and represented as an edge in the inference graph. Forexample, for each of the following factors 94A in the medical domainexample relating to an inferred answer Parkinson's Disease:

63-year-old

Resting tremor began 2 years ago

. . . Unexpressive face

the following relations corresponding to respective justifying passagesrepresented by respective inference graph edges of the inference graphfound at a first inference chaining iteration are listed below.

Edge: 96A indicates Parkinson's Disease by a discovered examplejustifying passage: “The mean age of onset of Parkinson's Disease isaround 60 years.”

Edge: 96B: indicates Parkinson's Disease by a discovered examplejustifying passage: “Resting tremor is characteristic of Parkinson'sDisease.”

Edge: 96C indicates Parkinson's Disease by a discovered examplejustifying passage: “Parkinson's disease: A slowly progressiveneurologic disease that is characterized by a fixed inexpressive face .. . ”

Further in the medical domain example, in a first iteration of theinference chaining method, factors 94B may each be found associated with(i.e., relate to) a node 95B, e.g., Athetosis. For example, for each ofthe following factors 94B in the medical domain example relating toanswer Athetosis:

Difficulty in walking

Continuous movement in the left hand

the following relations corresponding to respective justifying passageswith representative respective inference graph edges are listed below.

Edge: 96D indicates Athetosis by a discovered example justifyingpassage: “Patients suffering from athetosis often have trouble in dailyactivities such as eating, walking, and dressing”

Edge: 96E indicating Athetosis by a discovered example justifyingpassage: “Athetosis is defined as a slow, continuous, involuntarywrithing movement that prevents the individual from maintaining a stableposture.”

As shown in the graph of FIG. 7, the thickness of the relation (nodegraph edge) indicates a confidence level in the answer (e.g., aprobability), and the strength of the associated relation. For themedical domain example, the inferred node Parkinson's Disease 95Arelates most strongly to the factor “Resting tremor began 2 years ago,”as indicated by the thickness of edge 96B as compared to relationstrengths represented by edges 96A and 96C.

Further in the medical domain example of FIG. 7, in a second orsubsequent iteration of the inference chaining method described herein,from each of the inferred nodes 95A and 95B, a further inferred nodesmay be generated from additional relations obtained by the inferencechaining method.

For example, inferred node 95B Athetosis becomes a new factor from whichnew questions are generated and new relations 97A and 97B inferred fromPQA/reasoner implementation leading to new inferred nodes, Basal Ganglia95C and Striatum 95D. The following are relations represented byrespective inference graph edges based on the newly discovered nodes95C, 95D:

Edge: 97A indicating Basal Ganglia 95C by a discovered examplejustifying passage: “Athetosis is a symptom primarily caused by themarbling, or degeneration of the basal ganglia. In one embodiment, thisdiscovered relation may have resulted from injecting a “caused by” or“affects” relation in a relation injection process.

Edge: 97B indicating Striatum 95D by a discovered example justifyingpassage: “Lesions to the brain, particularly to the corpus striatum, aremost often the direct cause of the symptoms of athetosis. In oneembodiment, this discovered relation may have resulted from injecting a“caused by” relation in a relation injection process.

The thickness of node graph edges 97A, 97B indicates a confidence levelin the answer (e.g., a probability), and the strength of the associatedrelation.

Further in the medical domain example of FIG. 7, in a further iterationof the inference chaining method, inferred nodes (or factors) 95A, 95Cand 95D may each be further found associated with (i.e., relate to) newinferred nodes 98A-98E corresponding to candidate answers (new nodes)Cerebellum 98A, Lenticular nuclei 98B, Caudate nucleus 98C, Substantianigra 98D and Pons 98E. In the inference chaining method, as shown inFIG. 7, inferred nodes 95A (Parkinson's Disease), 95C (Basal Ganglia)and 95D (Striatum) each are found to strongly relate to the inferred newnode 98D (Substantia nigra) by following relations represented byrespective inference graph edges:

Edge: 93A indicating Substantia nigra by example justifying passage:“Parkinson's disease is a neurodegenerative disease characterized, inpart, by the death of dopaminergic neurons in the pars compacta of thesubstantia nigra.” This relation may have been discovered by injecting a“caused by” relation in a relation injection process.

Edge: 93B indicating Substantia nigra by example justifying passage:“The pars reticulata of the substantia nigra is an important processingcenter in the basal ganglia.” This relation may have been discovered byinjecting an “contains” relation in a relation injection process.

Edge: 93C indicating Substantia nigra by example justifying passage:“Many of the substantia nigra's effects are mediated through thestriatum.” This relation may have been discovered by injecting an“associated with” relation in a relation injection process.

Although not shown, it is assumed that from these inferred nodes 95 ofthe medical domain example of FIG. 7 there may be further indicatedcandidate answers 98A-98C and 98E by further respective edges andjustifying passages (not shown).

As shown, the substantial thickness of edges 93A and 93B relating to thecandidate answer, Substantia nigra 98D, indicate correspondingassociated scores having a higher confidence. Furthermore, the answernode Substantia nigra 98D is shown having a substantially thicker bordercompared to the other candidate answers 98 because the overallconfidence score for Substantia nigra 98D is higher than the othercandidate answers. As such, Substantia nigra 96D would be the mostlikely candidate answer to the question 92 as reflected by the checkmark.

FIG. 8 illustrates an embodiment of the factor analysis component 104 ofthe text-based inference chaining system and method of FIGS. 4-6 thatcooperatively performs processes to generate from a natural languageinquiry a set of factors that represents the initial nodes of aninference graph. The factor analysis component 104 includes a textanalysis component 204 which may include a known system and program suchas MetaMap that receives natural language text/inquiry input andanalyzes the input with a stack 210 of natural language processor (NLP)components. For more details on MetaMap refer to Alan R. Aronson andFrancois-Michel Lang, “An overview of MetaMap: Historical Perspectiveand Recent Advances,” J. Am. Med. Inform. Assoc., 2010, incorporatedherein by reference. MetaMap is available athttp://metamap.nlm.nih.gov/.

The NLP stack 210 components include, but are not limited to,relationship classification 210A, entity classification 210B, parsing210C, sentence boundary detection 210D, and tokenization 210E processes.In other embodiments, the NLP stack 210 can be implemented by IBM'sLanguageWare®, Slot Grammer as described in Michael C. McCord, “UsingSlot Grammer,” IBM Research Report 2010, Stanford University's Parser asdescribed in Marie-Catherine de Marneffe, et. al., “Generating TypedDependency Parses from Phrase Structure Parses,” LREC 2006, or othersuch technology components.

Factor identification component 208 implements processes for selectingfactors and may include a process that selects all the entitiesclassified as symptoms, lab-tests or conditions by the NLP Stack 210.Factor weighting component 212 may implement such techniques as inversedocument frequency (IDF) for producing weights for each of the factors.

Factor analysis component 104 identifies segments of the input inquirytext as “factors”. This may be terms, phrases or even entire sentencesfrom the original input. A very simple implementation of factoridentification, for example in the case of USMLE (United States MedicalLicensing Examination® (see http://www.usmle.org/) questions, are thatthe actual sentences in the case are each a factor.

In one embodiment, the factor identification takes as input a naturallanguage inquiry and produces as initial inference graph containing oneor more nodes—these nodes are referred to as the factors. A factor is astatement that is asserted to be true in the natural language inquiry.For example, in the medical domain, the inquiry may provide severalobservations about a patient and then ask a specific question about thatpatient, as in:

-   -   A 63-year-old patient is sent to the neurologist with a clinical        picture of resting tremor that began 2 years ago. At first it        was only on the left hand, but now it compromises the whole arm.        At physical exam, the patient has an unexpressive face and        difficulty in walking, and a continuous movement of the tip of        the first digit over the tip of the second digit of the left        hand is seen at rest. What part of his nervous system is most        likely affected?

The factor analysis component 104, may choose to generate factors atvarious levels of granularity. That is, it is possible for thetext-based inference chaining system and method to use more than onefactor identification component 208. The level of granularity isprogrammable so that: (1) questions can be subsequently generated forthe PQA system from each factor because the quality of the PQA system'sanswers may depend on the size and amount of information content in thequestion; and (2) the resulting inference graph could be used to explainto a user what factors were indicative of different candidate answers.For example, if the factors are very coarse grained this may havelimited utility.

In one example, factor analysis implementation might produce just onefactor that contains all of the information in the inquiry. However,this level of granularity provides two problems, (1) the PQA may not beas effective on a question that is generated from such a coarse-grainedfactor, and (2) even if a good answer can be produced, the resultinginference graph may not explain what part of the inquiry was mostimportant in determining the decision, which is useful information forthe user.

In a further factor analysis implementation example, the inquiry isdivided by the sentences. In the above-identified medical domainexample, the factor analysis component would produce three separatefactors (initial nodes in the inference graph), with the followingstatements:

-   -   1) A 63-year-old patient is sent to the neurologist with a        clinical picture of resting tremor that began 2 years ago.    -   2) At first it was only on the left hand, but now it compromises        the whole arm.    -   3) At physical exam, the patient has an unexpressive face and        difficulty in walking, and a continuous movement of the tip of        the first digit over the tip of the second digit of the left        hand is seen at rest.

To produce more fine-grained factors, natural language processing (NLP)components such as parsers, entity recognizers, relation detectors, andco-reference resolvers could be used. One use case for a co-referenceresolver is in the example of second factor 2) above, where it would beimportant to know that the word “it” refers to the “tremor”. Namedentity recognizers are implemented to identify mentions of importantdomain concepts, such as symptoms in the medical domain. Relationdetectors, often based on the parser output, can be used to identify ifthose concepts are attributed to the patient. A factor analysiscomponent 104 implementation based on such NLP analysis might thenproduce factors such as:

-   -   1) Patient is 63-years old    -   2) Patient has resting tremor    -   3) Tremor began 2 years ago    -   4) Tremor was only on the left hand, but now it compromises the        whole arm    -   5) Patient has unexpressive face    -   6) Patient has difficulty in walking    -   7) Continuous movement of the tip of the first digit over the        tip of the second digit of the left hand is seen at rest.

As further shown, the factor weighting component 212 is useful as somefactors may be more important than others in finding and scoring ananswer. Various techniques are possible for initializing the confidenceweighting in each factor. For example, the factor with the must uniqueterms relative to the domain may be given a higher weight than otherfactors. Known techniques including inverse document frequency (IDF) canbe used for producing weights for each of the factors. As shown, theresulting set of factors 215 is generated after the factor analysisprocess is complete, each factor representing the initial nodes 106 inan initial inference graph 1101.

Inference chaining systems 100, 100′, 100″ of respective FIGS. 4-6 forproducing inference graphs over content to answer inquiries each use aprobabilistic QA system 115 for discovering relations, and aparameterized question generation component 112 that generates questionsthat may be based on a one or mores independently generated relationtypes from a relation type injection component 130 for providing seedlogical relations for generating questions for the PQA system 115.

FIG. 9 illustrates a further detailed embodiment 300 of the questiongeneration component 112 of the text-based inference chaining systemimplementing a relation injection component 130 to generate naturallanguage questions 315 from the input inquiry 101.

Question generation component 112 takes as input a node 106 from aninitial inference graph 1101 and produces as output one or more naturallanguage questions 315, formatted in a manner suitable for processing bythe PQA system 115 in order to elicit responses that will be used toassert new relations into the inference graph.

In one embodiment, the question generation component 112 performsprocesses to produce questions that only ask for one kind of relation.For example, the “causes” relation. A simple implementation could justproduce questions of the form “What causes: X?” where X is the text ofthe inference graph node 106. Thus, from the above described medicaldomain example, given the initial graph node 106

-   -   Patient has resting tremor

Question Generation component 112 may generate the question:

-   -   What causes: Patient has resting tremor?

Another embodiment might produce more straightforward and grammaticalquestions, for example by applying question generation patterns ortemplates 125. An example of such a pattern could represent that thereference to a patient can be eliminated and in the above medical domainexample produce the question:

-   -   What causes resting tremor?

Depending on the PQA system 115, asking this question may result inimproved answers. Question generation component 112 further implementsprogrammed processes for producing questions that ask for many differentkinds of relations (e.g., “causes”, “indicates”, “is associated with”,“treats”).

As further shown in FIG. 9, relation type injection component 130separates the logic of forming a natural language question for the PQAsystem 115 from the relation types used to seed those questions.Relation type injection component 130 implements processes to decidewhat relation type or types should be asked for a given graph node 106.Relation type injection component 130 may decide on the relation type bydetermining the type of the inference graph node 106 and possibly thetarget type that the natural language inquiry is asking for, forexample, a disease, a location, an organ, a treatment, a drug, etc. Forexample, given an inference graph node 106 “Parkinson's Disease”, andwith knowledge that the inquiry asked for a treatment, the injectioncomponent would generate the question “What treats Parkinson'sDisease?”, rather than “What causes Parkinson's Disease.”

The question generation component 112 then in its general form combinesrelation types 136 with question templates or patterns 125. For example,relation types 136 “causes”, “indicates” or “treats” can be applied toquestion templates 125 such as:

-   -   What <relation> <factor>?    -   What <inverse-relation> <factor>?

To get corresponding questions such as, for example

-   -   What causes <factor>?    -   What is caused by <factor>?        where depending on the node in the inference graph, the process        may decide to substitute <factor> with the node phrase, for        example:    -   “resting tremor”        would produce the question:    -   What causes a resting tremor?        and    -   What indicates a resting tremor?

As mentioned above in connection with FIG. 4-6, an exampleimplementation of the reasoner component 150 is now described in greaterdetail herein below with respect to FIGS. 10 and 10A-10D.

FIG. 10 shows an implementation of the reasoner component 150 receivingas input an inference graph, such as extended inference graph 110E, withone or more statements identified as candidate endpoint nodes 151.Reasoner performs processes to generate from said input an outputprobability (or confidence level) for each statement at a node 151, forsubsequent merging or reading back into the inference graph formingupdated graph 110U.

In one embodiment, a method for computing probabilities at a node mayinclude counting the number of paths to each node, and normalizing tomake a number between 0 and 1 for each node.

In a further embodiment, as shown as processes 153 and 155, a Bayesiannetwork is generated from the inference graph. As shown in FIG. 10, thereasoning employed as programmed processes has two steps describedbelow.

Assimilation includes processes 153 to convert the set of relations intoa valid Bayesian network having no cycles. Processes may be optionallyperformed to optimize the graph for inference by removing redundantpaths. It is understood that a valid Bayesian network may have adifferent structure. For the example, as depicted in FIG. 10, a cyclehas been resolved by removal of relation 152 from the input inferencegraph 110E.

Given the assimilated graph, inference includes processes 155 that areimplemented to use belief propagation to infer the probabilities ofunknown nodes (i.e., candidates) from probabilities of known nodes (i.e.factors). FIG. 10 shows the example nodes 151 a, 151 b from the inputinference graph, where node 151 a is shown having a thicker borderrepresenting an event assertion having a greater computed confidence(higher probability) than the confidence value computed for additionalevent assertion of candidate node 151 b. One technique for performingbelief propagation can be found in a reference to Yedida J. S., Freeman,W. T., et. al. “Understanding Belief Propagation and ItsGeneralizations”, Exploring Artificial Intelligence in the NewMillennium, Chap. 8, pp. 239-236, January 2003 (Science and TechnologyBooks) incorporated by reference herein.

In the reasoner component 150, inferred probabilities are then read backinto the input inference graph, e.g., inference graph 110E, as shown at157 by copying the number (probability value) computed from the BayesianNetwork to the corresponding node in the inference graph which getspassed to the merging process 156 with unmodified structure.

In one embodiment, the reasoned component 150 does not return theassimilated Bayesian network. It leaves the input inference graphunchanged except for the computed (inferred) event probabilities asoutput inference graph 110U at 159. It is further understood thatexplanations may be generated by describing the edges along thestrongest path (most belief propagated) from known factors to the chosencandidate, e.g., node 151 a.

In FIG. 10A, for the medical domain example, the reasoner component 150receives data representing an example inference graph 161 including aset of relations R, whereby the inference graph includes (1) a relationR1 indicating Tremor indicates Parkinson's; (2) a relation R2 indicatingParkinson's causes tremor; and (3) a relation R3 that indicatesParkinson's indicates substantia nigra. Inference chaining may find aset of relations from a factor “termor” to produce candidate answerssuch as Basal ganglia (not shown) and a candidate answer Pons 163 asshown in FIG. 10A.

More generally, with reference to FIGS. 4-6, the data structures inputand output by the reasoner component 150 are as follows. The input is anobject called an “inference question,” which includes: (1) a collectionof relations, where a relation has a head node, a tail node, a type, anda strength; (2) a collection of nodes identified as factors, withassigned probabilities; and (3) a collection of nodes identified ascandidates, whose probability is not necessarily known. The reasonercomponent 150 output includes a probability for each node in the graph,including candidates. The reasoner component 150 may optionally outputan explanation for why each candidate received the probability that itdid.

The reasoner component 150 is programmed to assign a probability to allnodes, not just candidates, because the question generation component112 may give higher priority to some non-candidate nodes based on theirpropagated probability. One particular implementation includes aBayesian network but the reasoner component may implement othertechniques.

For example, the Bayesian network may be used for training theprobabilistic QA system as follows. Asserting the correct answer as setto probability 1, and disasserting the incorrect answers as set toprobability 0. Then propagate belief through the graph. Edges that passpositive messages can be used as positive training examples, and edgesthat pass negative messages can be used as negative training examples.

As the inference graph 161 of FIG. 10A may not form a valid Bayesiannetwork because relations R1 and R2 form a cycle, then as part of theassimilation component of the reasoning processes performed, thereasoner component 150 implements processes to convert the inferencegraph to a valid Bayesian network, for instance, by dropping the weakestlink in each cycle. As shown in FIG. 10B, edges “E1” and “E2” are edgesin the corresponding Bayesian network 164 corresponding to the inferencegraph 161 shown in FIG. 10A. In a first reasoner inference, factors areassigned their known probabilities resulting in a Bayes net 165 shown inFIG. 10C. For illustrative purposes, the factor “tremor” 168 is shown ashaving a probability as indicated by a thickness of the node border, Ina second reasoner inference, beliefs are propagated through the graphresulting in Bayes net 167 shown in FIG. 10C with each node having anassigned probability based on the propagated beliefs. Then, as shown inFIG. 10E, the probabilities generated from Bayes network 167 are readback to populate the corresponding nodes in the original inference graph161 of FIG. 10A now showing the reasoned probabilities by respectiveborder thicknesses.

FIG. 10F shows that the inference graph 161 may be but one part of acomplex network 160 of interconnected nodes and edges.

In FIGS. 10C-10F, for illustrative purposes, the thickness of a border168 of a node is used to indicate how probable that event is. Likewise,the thickness of an edge 169 represents the strength of the strongestmessage that is passed along that edge. For example, a thicker nodeborder 168 of candidate node 162 as compared to border 168 of candidatenode 163 indicates a more probable candidate. In FIG. 10D, nodes 106 a,106 b, 106 c represent factors (events whose probability is known) whilenodes 162, 163, and nodes 164 represent candidate answers, or nodes,i.e., answers which play a role in answering the question. Other nodesof the graph are also shown.

Although not shown in the visualization 160 in FIG. 10E of the medicaldomain example, probabilities underlying graph nodes are numbers ofvalues between 0 and 1 representing the event probability and messagestrength. For this there is further displayed an answer probabilitiestable 199 representing the outputs of the updated graph. From thereasoner component's 160 perspective, they are the probabilities of eachanswer after the graph has been assimilated and propagated, normalizedso that they sum to one. These outputs 199 represent the output of thetext-based inference chaining system for the medical domain example withthe indicating the better candidate answers.

Thus, text-based inference chaining system 100, 100′, 100″ of FIGS. 4-6,provides an inference graph generator system and method for producinginference graphs over unstructured content to answer inquiries using aprobabilistic QA system for discovering relations. Further, as will bedescribed with respect to FIG. 12 below, the text-based inferencechaining system 100, 100′, 100″ of FIGS. 4-6, or the variouscombinations thereof, may be programmed to employ a bi-directional graphgeneration inquiry solution strategy.

As shown in FIG. 12, a system and method may produce inference graphs byindependently, and optionally in parallel (simultaneously), performingforward inference from factors extracted from the inquiry and backwardinference from hypothetical answers produces by a hypotheses, orcandidate answer, generator.

FIG. 12 shows a text-based inference chaining system and methodemploying a bi-directional graph generation inquiry solution strategy.From the initial input inquiry 101, the chaining system 100 performsfactor-directed processes 400 that generate a final forward inferencegraph 110FF. Either in parallel or concurrent in time, the chainingsystem 100 performs hypothesis-directed processes 500 that generate afinal backward inference graph 110FB having possible solutions indicatedinference graph as end-point nodes 514. That is, in one embodiment, tobetter manage graph generation from the factors and reduce the time ittakes to find paths to possible solutions, the process includesgenerating a forward-directed graph from the factors andbackward-directed graph from candidate answers 515 looking for a bridge,i.e., a meeting point, where a relation can be found joining end-pointsof each graph, and then joining the graphs. A programmed inference graphjoiner component 600 looks for a bridge that joins the graphs, producingfinal inference graph 610.

FIG. 13 illustrates the factor-directed or forward-directed inferencegraph generation iterative process 400 which functions as programmedtext-based inference chaining system 100, 100′, 100″ of FIGS. 4-6, orcombinations thereof. In this embodiment, an initial or original forwardinference graph 110IF is constructed that includes factors 406 extractedfrom an initial input inquiry 101 as initial nodes of the initialinference graph 110IF. At each iteration, the previous forward inferencegraph is labeled 110PF (or, in a first iteration of processing, theinitial forward inference graph is 110IF), and, at each iteration, anextended forward inference graph 110EF is generated by graph extender118; and an updated forward inference graph 110UF is generated withnodes having confidence values by the reasoned component 150. The depthcontroller component 175 will halt the iteration and output the updatedinference graph 110UF as the final forward inference graph 110FF at aspecified depth or when at least one discovered relation accumulatesconfidence over a given threshold. Otherwise, the updated inferencegraph 110UF becomes the current inference graph as a new input to thequestion generation component 112 and the cycle 99 iterates. The finalforward inference graph 110FF includes the factors identified from theinquiry and new nodes that were inferred from those factors withconfidence values. For the medical domain example, from factoridentification processing and after forward directed graph generation,the final inference graph may include the following example inferrednodes with confidence values:

-   -   Patient has Parkinson's Disease: 0.8    -   Patient has Dystonia: 0.15    -   Patient has Athetosis: 0.03

FIG. 14 illustrates the hypothesis-directed inference graph generationiterative process 500 which functions similarly as programmed text-basedinference chaining systems 100, 100′, 100″ of FIGS. 4-6, or combinationsthereof, however, implements a candidate answer generator 504 to producethe initial nodes in constructing the backward inference graph 110IB. Inthis embodiment, an initial backward inference graph is labeled 110IB, acurrent backward inference graph is 110PB, and the extended backwardinference graph 110EB are generated by graph extender 118, and a newrevised inference graph 110UB (after a first iteration of processing,for example) is generated by the reasoner component 150. In thisembodiment of process 500, the candidate answer generator 504 performsprogrammed processes to receive and analyze the input inquiry 101. Thecandidate answer generator 125 uses different techniques to produce manypossible (candidate) answers or solutions that represent different“hypotheses” each of which become initial nodes 506 in a backwardsinference graph 110IB, and each of which, the system may be connected tosome subset of factors in the final output bi-directional inferencegraph. Further, the depth controller 175 will halt the iteration andoutput the new inference graph as the final backward graph 110FB at aspecified depth. Otherwise the new inference graph, e.g., graph 110UB,becomes the new input to the question generation component 112 and thecycle 99 iterates.

In backward-directed graph generation, processes are implemented toaccess a candidate answer generator 504 that receives the inquiry andconducts a search using known methods to produce possible answers (e.g.,parts of the nervous system) based on the inquiry. For theabove-described medical domain example (See FIG. 10D), example candidateanswers generated may include: (1) Substantia nigra, (2) Caudatenucleus, (3) Lenticular nuclei, (4) Cerebellum and (5) Pons.

In backward-directed graph generation, components of the text-basedchaining system 100, 100′, 100″ of FIGS. 4-6, or combinations thereof,extend this graph. In particular question generation component 112generates natural language questions suitable for input to the PQAsystem, such as:

-   -   What causes Substnatia Nigra to be affected?    -   What causes Caudate nucleus to be affected?

The PQA system component 115 is invoked to produce answers to thesequestions. For example, Parkinson's Disease causes Substantia Nigra tobe affected. The graph extender component 118 adds these as edges to thebackward-directed graph. Multiple Iterations may be performed to formlonger paths in the inference graph.

In one embodiment, the candidate answer generator may be implementedusing the same methods used in IBM's DeepQA system for candidate answergeneration such as described below with respect to FIG. 19. Generally,candidate answer generation implements processes that break the inputquery into query terms, the query terms having searchable components.Then, a search engine built into or accessed by the QA system performsconducting a first search of the content using one of more of thesearchable components to obtain documents including candidate answers.The documents may be analyzed to generate a set of candidate answers.Then, a further search may be conducted in the content using thecandidate answers and the searchable components of the query terms toobtain one or more supporting passages, the supporting passages havingat least one of said candidate answers and at least one of saidsearchable components of the query terms. A confidence level of thesecandidate answers may be determined using a scoring technique as knownin the art for scoring the supporting passages.

FIG. 15 illustrates the implementation of an inference graph joinerprocess 600 to merge nodes or join respective forward- andbackward-directed graphs obtained by programmed inference chainingengines as described in FIG. 13 and FIG. 14. In this embodiment, likeelements in FIG. 15 function identically as the inference chainingsystem and various embodiments described herein with respect to FIGS.4-6, 13, 14, to provide a system and method for producing a singleintegrated output inference graph through a parallel (i.e.,simultaneous) bi-directional graph generation running forward orfactor-directed graph generation, and backward or hypothesis-directedinference graph generation processes. The method uses a depth controllerto limit the generation of both paths if the nodes do not meet and aninference graph joiner process 600 to force the discovery of relationsthat may join the answers to factors in the inquiry. Inference graphjoiner process 600 is implemented by a computer system that receives asan input both the nodes and relations data representing the finalforward inference graph 110FF and the final backward graph 110FB.

The inference graph joiner process 600 joins two paths from factorsthrough intermediate nodes to possible answers, and specifically inconnecting forward generated inferences graphs with backward generatedinference graphs. A first and optional step in graph joining is nodemerging at node merging element 665. Node merger 665 implementsprogrammed processes to analyze different concepts end-points withinbi-directionally generated graphs and probabilistically determine ifthey refer to the same logical statements (concepts).

If any two different nodes in the graph are probabilistically determinedwith enough certainty that they do refer to the same concept, then theyare merged into a single node reducing the number of paths in the graph.Node merging may further automatically connect/join two graphs(bi-directionally generated or not). This happens when the nodes thatmerged were from distinct graphs that the system was trying to join. Theimplicit question being answered by the node merger is “Do these twonodes refer to the same logical statement?” Thus, no explicit questionis required to be asked to the PQA system to join the nodes as how it isdone by the node joiner. If it is probabilistically determined that theydo refer to the same concepts with enough certainty then they are mergedinto a single node reducing the number of extraneous or noisy paths inthe graph that would dilute the confidence propagation. This may beperformed using any number of term matching or co-reference techniquesthat look at syntactic, semantic or contextual similarity usingtechniques as known in the art. The MetaMap program referred to hereinabove is one example system that may be implemented in the medicaldomain. Given two terms, MetaMap may be used to determine if they referto the same medical concept. In general, any “domain dictionary” thatidentifies synonymous terms for a given domain can be used in this way.As other medical domain examples, Diabetes may be merged with DiabetesMellitus or Cold with Cold Virus or High Blood Pressure withHypertension. Node joining performance will improve if it connects themerged node into another graph rather than connect them separately.

After invoking optional node merger 665, node joiner element 675implements programmed processes to detect relation end-points that arenot on a path connecting a factor to an answer and attempt to discover alink between them (the factor and answer) using a part of the system.

Particularly, joiner process 675 receives both bi-directionallygenerated graphs and searches for two disconnected nodes (one from eachgraph) that may be connected by a relation. For example, one backwarddirected graph node is “Diabetes” and the other node is “Blindness”. Thenode joiner generates questions that may link two nodes. For example:

-   -   Does Diabetes cause Blindness?

As shown in FIG. 15, a determination is made as to whether the PQAsystem component 115 answers with sufficient confidence that a newrelation is asserted connecting the nodes and a new path built. Thegraphs are then joined. In one embodiment, the node joiner may employconnecting all leaf nodes in the two graphs and then rank the pathsbased on the propagated confidences. It may select only some pairs ofnodes based on their types.

With respect to inference graph joiner process 600 of FIG. 15, there aretwo cases to consider in any implementation: 1) the forward and backwardinference graphs may naturally intersect; or 2) forward and backwardinference graphs do not intersect.

For the medical domain example, it is the case that the forward-directedand backward-directed inference graphs naturally intersect. In thisexample, the forward-directed graph includes end-point “Parkinson'sDisease” with high confidence, and the backward-directed graph includesthe relation Parkinson's Disease causes Substantia Nigra to be affected,so when the graphs are combined there is a path leading from the initialfactors to the candidate answer, and the iterative process terminates.

FIG. 16 depicts an example node joiner process attempted to combine thebi-directionally generated inference graphs by looking for relationsbetween end-point nodes 514 of the forward-directed graph, e.g., graph110FF and a node in the backward-directed graph, e.g., 110FB. In oneembodiment, this is performed by asking “yes”/“no” or multiple-choicequestions to the PQA system component 115. In one embodiment, FIG. 16shows a relation 516 produced by inference graph joiner process 600 thatjoins a node 524 of the final forward inference graph 110FF and a node526 of the final backward inference graph 110FB. This relation 516 isshown thicker, e.g., as compared with the thickness of anotherdiscovered relation 517, indicating the computing of a highestconfidence level of the identified relation(s) and a correspondingjustifying passage supporting joining the endpoint nodes 524 and 526 ofthe final inference graph. The node 526 is indicated with thicker borderindicating a highest computed probability of a correct solution oranswer, for example, as compared to end-point node 525 which may bejoined as a result of finding another discovered relation 517 of weakerconfidence level.

For the medical domain example described herein, programmed joinerprocess may provide example “Yes/No” questions that are generated by thequestion generation component for processing in the PQA system component115. Examples are shown below.

-   -   Does Parkinson's Disease cause Substantia nigra to be affected?    -   Does Parkinson's Disease cause Caudate nucleus to be affected? .        . . .

For the medical domain example described herein, example multiple-choicequestions that are generated for processing in the PQA system component115 may include:

-   -   Parkinson's Disease causes which of the following to be        affected: (Substantia nigra, Caudate nucleus, Lenticular nuclei,        Cerebellum, Pons)

FIGS. 17A-17B illustrate one example of inference graph computationaccording to the embodiments described herein. From an input inquiry601:

-   -   ON HEARING OF THE DISCOVERY OF GEORGE MALLORY'S BODY, THIS        EXPLORER TOLD REPORTERS HE STILL THINKS HE WAS FIRST.        and processing using one or more of the text analysis, factor        identification and factor weighting components of the factor        analysis component 200 of FIG. 8 will obtain the following        factors 606A, 606B as follows:    -   606A: GEORGE MALLORY from “DISCOVERY OF GEORGE MALLORY'S BODY”    -   606B: FIRST EXPLORER from THIS EXPLORER TOLD REPORTERS HE STILL        THINKS HE WAS FIRST        with emphasis indicating the initial nodes (factors) generated        from the query. These will be simultaneously processed along        parallel processing paths 605A, 605B, supported by the computing        system described herein. In particular, using respective        question generation components 612A, 612B. The question        generation process 612A, 612B generates respective questions        613A, 613B.    -   613A: This is associated with George Mallory    -   613B: This is associated with First Explorer

Via parallel implementations of the PQA systems 615A, 615B, thefollowing justifying passages 620A, 620B are obtained from the searched(structured+unstructured) content.

-   -   620A: George Herbert Leigh Mallory (18 Jun. 1886-8/9 Jun. 1924)        was an English mountaineer who took part in the first three        British expeditions to Mount Everest in the early 1920s.    -   620B: A mountaineering expert will today claim that Sir Edmund        Hillary was not the first man to scale Everest—and that it was        in fact conquered three decades before by the British climber        George Mallory.    -   620C: Sir Edmund Hillary was a mountain climber and Antarctic        explorer who, with the Tibetan mountaineer Tenzing Norgay, was        the first to reach the summit of Mount Everest.

Resulting from implementation of the reasoner component 150 processesfor propagating confidences, the following candidate answers 622A, 622Bare generated:

-   -   622A: Mount Everest and    -   622B: Edmund Hillary

The increased thickness of the border for answer Edmund Hillary 622Bindicates the relative increased confidence (score) associated with ahigher confidence value as computed by the reasoner component 150 fromwhich it is determinable as the best answer.

FIG. 17A further shows the resulting generated inference graph 610Agenerated during a single iteration of parallel processing path 605Ahaving initial node (factor 606A) associated or related with a candidateanswer Mount Everest 622A (as supported by justifying passage).Likewise, parallel processing path 605B results in generating inferencegraph 610B having initial node (factor 606B) associated or related withEdmund Hillary as candidate answer 622B having the highest computedconfidence as indicated by thickest border.

FIGS. 17A, 17B further show the node joiner process 675 which performs ajoin of the parallel formed inference graphs 610A, 610B. The inferencegraph join process first determines the generated candidate answers, andhaving determined them, determines whether these lead to a singlecorrect answer.

The joining is being used to determine how confidence flows between twopossible answers (e.g., Mt. Everest and Edmund Hillary) discovered fromdifferent factors in the question (as the factor Edmund Hillary was alsoa candidate answer from the first factor discovered from the annotatingpassage connected to that link).

In the method shown in FIG. 17A, generated candidate answers may betreated as factors from which a question may be generated for PQAprocessing. For example, by joining inference graphs 610A, 610B, theanswers Mt. Everest and Sir Edmund Hillary become factors from which aquestion may be generated by question generator component 112 toascertain their relation and the confidence strength of the association:an example question 672 is generated:

-   -   Is Mount Everest associated with Edmund Hillary?

Using processing by the PQA system component 115, it is readilydetermined that there is an association between the answers Mt. Everestand Sir Edmund Hillary as indicated by the “yes” answer 678 in thejoiner 675. Thus, for example, the following justifying passage 620D isobtained from the searched (structured+unstructured) content:

-   -   On 29 May 1953, Hillary and Tenzing Norgay became the first        climbers confirmed as having reached the summit of Mount        Everest.

Having established the relationship between answers Mt. Everest and SirEdmund Hillary as indicated, the final inference graph of FIG. 17B willshow a relation between the formed answers and a correspondingconfidence as supported by the found justifying passage 620D.

FIG. 18 shows a further embodiment of the inference chaining systemincluding a parallel implementation of PQA systems. FIG. 18 includes asystem and method for generating inference graphs for discovering andjustifying answers to inquiries according to the embodiments describedherein. A parallel PQA service 350 implementing in parallelprobabilistic QA systems 355A, 355B, . . . , 355N, allow for scalableand efficient execution of the generative process on a computer system.Thus, it is seen from FIG. 18, the output 317 of question generationcomponent 112 is generated as plural queries (questions) each respectivequery serviced by a respective PQA system 355A, 355B, . . . , 355N ofthe parallel array of PQA systems to provide for improved latency.

FIG. 19 shows a system diagram depicting a high-level logicalarchitecture and methodology of an embodiment of each PQA system 355. Asshown in FIG. 19, the architecture 355 includes a query analysis module320 implementing functions for receiving and analyzing an input textquery or question 319. In an embodiment depicted, the questiongeneration component of a text-based programmed inference chainingsystem as described herein, generates the query 319, e.g., from factors.A candidate answer generation module 330 is provided to implement asearch for candidate answers by traversing structured, semi structuredand unstructured sources, e.g., content contained in a primary sourcesmodule 311 and/or in an answer source knowledge base module 321containing, for example, collections of relations and lists extractedfrom primary sources. All the sources of information can be locallystored or distributed over a network, including a public network, e.g.,Internet, or World-Wide-Web. The candidate answer generation module 330generates a plurality of output data structures containing candidateanswers based upon the analysis of retrieved data. In FIG. 19, oneembodiment is depicted that includes an evidence gathering module 370interfacing with the primary sources 311 and knowledge base 321 forconcurrently analyzing the evidence based on passages having candidateanswers, and scoring each of the candidate answers as parallelprocessing operations as described in commonly-owned, co-pending U.S.patent application Ser. Nos. 12/152,411 and 12/126,642, for example, thewhole disclosures of each of which are incorporated by reference as iffully set forth herein.

In one embodiment, the architecture may be employed utilizing a commonanalysis system (CAS) candidate answer structures, and implementingsupporting passage retrieval operations. For this processing, theevidence gathering module 370 implements supporting passage retrievaloperations and the candidate answer scoring in separate processingmodules for concurrently analyzing the passages and scoring each of thecandidate answers as parallel processing operations. The knowledge base321 includes content, e.g., one or more databases of structured orsemi-structured sources (pre-computed or otherwise) and may includecollections of relations (e.g., Typed Lists). In an exampleimplementation, the answer source knowledge base may comprise a databasestored in a memory storage system, e.g., a hard drive. An answer rankingmodule 360 provides functionality for ranking candidate answers, i.e.,compute a confidence value, and determining a response 399 that isreturned to the engine along with respective confidences for potentiallyextending the inference graph with nodes and relations. The response maybe an answer, or an elaboration of a prior answer, or a request forclarification in response to a question—when a high quality answer tothe question is not found.

In one embodiment, the system shown in FIG. 19, to employ one or moremodules for enabling I/O communication between a user or computer systemand the system 10 according to, but not limited to, the modalities oftext, audio, video, gesture, tactile input and output etc. Thus, in oneembodiment, both an input query and a generated query response may beprovided in accordance with one or more of multiple modalities includingtext, audio, image, video, tactile or gesture.

FIG. 20 illustrates an exemplary hardware configuration of a computingsystem 401 in which the present system and method may be employed. Thehardware configuration preferably has at least one processor or centralprocessing unit (CPU) 411. The CPUs 411 are interconnected via a systembus 412 to a random access memory (RAM) 414, read-only memory (ROM) 416,input/output (I/O) adapter 418 (for connecting peripheral devices suchas disk units 421 and tape drives 440 to the bus 412), user interfaceadapter 422 (for connecting a keyboard 424, mouse 426, speaker 428,microphone 432, and/or other user interface device to the bus 412), acommunication adapter 434 for connecting the system 400 to a dataprocessing network, the Internet, an Intranet, a local area network(LAN), etc., and a display adapter 436 for connecting the bus 412 to adisplay device 438 and/or printer 439 (e.g., a digital printer of thelike).

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with a system, apparatus, or device runningan instruction.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with asystem, apparatus, or device running an instruction.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may run entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).

Thus, in one embodiment, the system and method for efficient passageretrieval may be performed with data structures native to variousprogramming languages such as Java and C++.

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which run via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks. These computerprogram instructions may also be stored in a computer readable mediumthat can direct a computer, other programmable data processingapparatus, or other devices to function in a particular manner, suchthat the instructions stored in the computer readable medium produce anarticle of manufacture including instructions which implement thefunction/act specified in the flowchart and/or block diagram block orblocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which run on the computeror other programmable apparatus provide processes for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more operable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be run substantiallyconcurrently, or the blocks may sometimes be run in the reverse order,depending upon the functionality involved. It will also be noted thateach block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

The embodiments described above are illustrative examples and it shouldnot be construed that the present invention is limited to theseparticular embodiments. Thus, various changes and modifications may beeffected by one skilled in the art without departing from the spirit orscope of the invention as defined in the appended claims.

What is claimed is:
 1. A method of inferring answers to inquiriescomprising: receiving an input inquiry; decomposing the input inquiry toobtain one or more factors, said factors forming initial nodes of aninference graph; iteratively constructing said inference graph over oneor more content sources, wherein at each iteration, a processing devicediscovers answers to said input inquiry by connecting factors to saidanswers via one or more relations, each relation in an inference graphbeing justified by one or more passages from said content sources, saidinference graph connecting factors to said answers over one or morepaths having one or more edges representing said relations; and,providing an answer to said inquiry from said inference graph, wherein aprogrammed processor device is configured to perform one or more saidreceiving, decomposing and said iteratively constructing said inferencegraph to provide said answer.
 2. The method as claimed in claim 1,wherein said iteratively constructing said inference graph comprises:expanding said inference graph at each iteration by: generating one ormore questions based on one or more current nodes in said graph;searching in one or more content sources to identify one or morerelations leading to new answers and representing said new answers asnew additional nodes in said inference graph, each new additional nodeconnected via an edge representing the relation, and each relationhaving an associated justifying passage at an associated confidencelevel, inferring, from said associated confidence levels, a confidencelevel at each node of said inference graph to provide an updatedinference graph, determining if the updated inference graph meets acriteria for terminating said iteration, and one of: terminating saiditeration if said criteria is met; otherwise, repeating said generating,searching, inferring and determining steps with said new additionalnodes being current nodes for a next iteration, wherein, uponterminating, said answer to said inquiry is a node from said updatedinference graph.
 3. The method as claimed in claim 2, wherein saidsearching comprises: identifying one or more justifying passagessupporting a relation between connected nodes of said inference graph.4. The method as claimed in claim 2, wherein said terminating criteriacomprises: identifying a node of said updated inference graph having aninferred confidence value exceeding a predetermined threshold; or,performing a predetermined number of iterations.
 5. The method asclaimed in claim 2, wherein said inferring a confidence level comprises:forming a Bayesian network from nodes and relations of said inferencegraph and an associated confidence value representing a probability ofbelief that a supporting passage justifies the answer for the node; and,in each answer propagating associated confidence values across saidrelations and nodes represented in said Bayesian network.
 6. The methodas claimed in claim 2, wherein said factors or current nodes comprise astatement, said generating questions comprising: determining apredetermined relation type corresponding to the statement; and, using atemplate corresponding to the predetermined relation type to form aquestion from said statement.
 7. The method as claimed in claim 2,wherein said factors comprise statements, said method furthercomprising, at each iteration, one or more of: prioritizing selectedstatements as factors for expedient corresponding question generation;or filtering selected statements and removing them as factors forcorresponding question generation.
 8. The method as claimed in claim 2,wherein said decomposing the input inquiry comprises: analyzing a textof said question; identifying said one or more factors from saidanalyzing; and applying weights to said one or more factors.
 9. Themethod as claimed in claim 2, further comprising: decomposing the inputinquiry into query terms, and using said query terms to obtain one ormore candidate answers for said input inquiry; performing as parallelsimultaneous operations: iteratively constructing, by the programmedprocessor device, a first inference graph from factors obtained from theinput inquiry, a constructed first inference graph connecting factors toone or more nodes that lead to an answer for said inquiry over one ormore paths having one or more edges representing said relations; anditeratively constructing, by said programmed processor device, a secondinference graph from said candidate answers, said second inference graphconnecting said candidate answers to one or more nodes that lead to saidone or more factors of said inquiry over one or more paths having one ormore edges representing relations; determining, during said simultaneousiterative constructing, whether a first inference graph can be joined tosaid second inference graph to generate a final inference graph having anode representing an answer to said input inquiry.
 10. The method asclaimed in claim 9, wherein said determining whether said firstinference graph can be joined to said second inference graph comprises:determining, using a similarity criteria applied to end-point nodes ofeach said first and said second inference graphs whether two saidend-point nodes can be merged into a single node to join said graphs; orforcing a discovering of a relation that forms an edge joining anend-point node of said first inference graph to an end-point answer nodein said second inference graph.
 11. A method of inferring answers toinquiries comprising: receiving an input inquiry; decomposing the inputinquiry to obtain one or more factors; and, decomposing the inputinquiry into query terms, and using said query terms to obtain one ormore candidate answers for said input inquiry; iteratively constructingusing a programmed processor device coupled to a content storage sourcehaving content, a first inference graph using said factors as initialnodes of said first inference graph, a constructed first inference graphconnecting factors to one or more nodes that lead to an answer for saidinquiry over one or more paths having one or more edges representingsaid relations; simultaneously iteratively constructing, using theprogrammed processor device and the content source, a second inferencegraph using said one or more candidate answers as initial nodes of saidsecond inference graph, said second inference graph connecting candidateanswers to one or more nodes that connect to said one or more factors ofsaid inquiry over one or more paths having one or more edgesrepresenting relations; and, generating, during said simultaneousiterative constructing, a final inference graph by joining said firstinference graph to said second inference graph, said final inferencegraph having a joined node representing an answer to said input inquiry.12. The method as claimed in claim 11, wherein said iterativelyconstructing each said first inference graph and said second inferencegraph (inference graph) comprises expanding each inference graph at eachiteration by: generating one or more questions based on one or morecurrent nodes in said graph; searching in one or more content sources toidentify one or more relations leading to new answers and representingsaid new answers as new additional nodes in said inference graph, eachnew additional node connected via an edge representing the relation, andeach relation having an associated justifying passage at an associatedconfidence level, inferring, from said associated confidence levels, aconfidence level at each node of said inference graph to provide anupdated inference graph, determining if the updated inference graphmeets a criteria for terminating said iteration, and one of: terminatingsaid iteration if said criteria is met; otherwise, repeating saidgenerating, searching, inferring and determining steps with said newadditional nodes being current nodes at a next iteration, wherein, uponterminating, said answer to said inquiry is a node from said updatedinference graph.
 13. The method as claimed in claim 12, wherein saidgenerating the final inference graph comprises: determining, using asimilarity criteria applied to end-point nodes of each said first andsaid second inference graphs whether two said end-point nodes can bemerged into a single node that joins said first inference or secondinference graph.
 14. The method as claimed in claim 13, wherein saiddetermining using a similarity criteria comprises: applying one or moreof: term matching or co-referencing to identify one or more of: asyntactic, semantic or contextual similarity between said identifiedend-point node of said first inference graph node and an end-point nodeof said second inference graph, and merging said identified end-pointnodes meeting one or more of: a syntactic, semantic or contextualsimilarity criteria.
 15. The method as claimed in claim 12, wherein saidgenerating a final inference graph comprises: forcing the discovering ofa relation that forms an edge joining an end-point node of said firstinference graph to an end-point answer node in said second inferencegraph.
 16. The method as claimed in claim 15, wherein said forcing thediscovering of a relation that forms an edge comprises: generating, froman end-point factor node of said first inference graph to an end-pointcandidate answer node in said second inference graph, one of: a“yes”/“no” or multiple-choice question, and using said generated“yes”/“no” or multiple-choice question to determine whether a relationbetween said respective end-point nodes exists, said relation joining acandidate answer to a factor of the input inquiry.
 17. The method asclaimed in claim 11, wherein said query terms include searchablecomponents, said obtaining candidate answer comprising: conducting asearch over content from one or more content sources using one of moreof the searchable components to obtain candidate answers used as saidinitial nodes for said second graph constructing.